Lecture 10
matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.
Why do we usually import only pyplot then?
Matplotlib is the whole package; matplotlib.pyplot is a module in matplotlib; and pylab is a module that gets installed alongside matplotlib.
Pyplot provides the state-machine interface to the underlying object-oriented plotting library. The state-machine implicitly and automatically creates figures and axes to achieve the desired plot.
Important terminology:
Figure - The entire plot (including subplots)
Axes - Subplot attached to a figure, contains the region for plotting data and axis’
Axis - Set the scale and limits, generate ticks and ticklabels
Artist - Everything visible on a figure: text, lines, axis, axes, etc.
x = np.linspace(-2, 2, 101)
fig, axs = plt.subplots(2, 2, figsize=(4, 4))
axs[0,0].plot(x, x, "b", label="linear")
axs[0,1].plot(x, x**2, "r", label="quadratic")
axs[1,0].plot(x, x**3, "g", label="cubic")
axs[1,1].plot(x, x**4, "c", label="quartic")
[ax.legend() for row in axs for ax in row]
fig.suptitle("More subplots")x = np.linspace(-2, 2, 101)
fig, axd = plt.subplot_mosaic([['upleft', 'right'],
['lowleft', 'right']])
axd['upleft'].plot(x, x, "b", label="linear")
axd['lowleft'].plot(x, x**2, "r", label="quadratic")
axd['right'].plot(x, x**3, "g", label="cubic")
axd['upleft'].set_title("Linear")
axd['lowleft'].set_title("Quadratic")
axd['right'].set_title("Cubic")For quick formating of plots (scatter and line) format strings are a useful shorthand, generally they use the format '[marker][line][color]',
| character | shape |
|---|---|
. |
point |
, |
pixel |
o |
circle |
v |
triangle down |
^ |
triangle up |
< |
triangle left |
> |
triangle right |
| … | + more |
| character | line style |
|---|---|
- |
solid |
-- |
dashed |
-. |
dash-dot |
: |
dotted |
| character | color |
|---|---|
b |
blue |
g |
green |
r |
red |
c |
cyan |
m |
magenta |
y |
yellow |
k |
black |
w |
white |
Beyond creating plots for arrays (and lists), addressable objects like dicts and DataFrames can be used via data,
To fix the legend clipping we can use the “contrained” layout to adjust automatically,
np.random.seed(19680801)
d = {'x': np.arange(50),
'color': np.random.randint(0, 50, 50),
'size': np.abs(np.random.randn(50)) * 100}
d['y'] = d['x'] + 10 * np.random.randn(50)
plt.figure(
figsize=(6, 3),
layout="constrained"
)
plt.scatter(
'x', 'y', c='color', s='size',
data=d
)
plt.xlabel("x-axis")
plt.ylabel("y-axis")Data can also come from DataFrame objects or series,
df = pd.DataFrame({
"x": np.random.normal(size=10000)
}).assign(
y = lambda d: np.random.normal(0.75*d.x, np.sqrt(1-0.75**2), size=10000)
)
fig, ax = plt.subplots(figsize=(5,5))
ax.scatter('x', 'y', c='k', data=df, alpha=0.1, s=0.5)
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_title("Bivariate normal ($\\rho=0.75$)")Series objects can also be plotted directly, the index is used as the x axis values,
Axis scales can be changed via plt.xscale(), plt.yscale(), ax.set_xscale(), or ax.set_yscale(), supported values are “linear”, “log”, “symlog”, and “logit”.
y = np.sort( np.random.sample(size=1000) )
x = np.arange(len(y))
plt.figure(layout="constrained")
scales = ['linear', 'log', 'symlog', 'logit']
for i, scale in zip(range(4), scales):
plt.subplot(221+i)
plt.plot(x, y)
plt.grid(True)
if scale == 'symlog':
plt.yscale(scale, linthresh=0.01)
else:
plt.yscale(scale)
plt.title(scale)
plt.show()df = pd.DataFrame({
"cat": ["A", "B", "C", "D", "E"],
"value": np.exp(range(5))
})
plt.figure(figsize=(4, 6), layout="constrained")
plt.subplot(321)
plt.scatter("cat", "value", data=df)
plt.subplot(322)
plt.scatter("value", "cat", data=df)
plt.subplot(323)
plt.plot("cat", "value", data=df)
plt.subplot(324)
plt.plot("value", "cat", data=df)
plt.subplot(325)
b = plt.bar("cat", "value", data=df)
plt.subplot(326)
b = plt.bar("value", "cat", data=df)
plt.show()df = pd.DataFrame({
"x1": np.random.normal(size=100),
"x2": np.random.normal(1,2, size=100)
})
plt.figure(figsize=(4, 6), layout="constrained")
plt.subplot(311)
h = plt.hist("x1", bins=10, data=df, alpha=0.5)
h = plt.hist("x2", bins=10, data=df, alpha=0.5)
plt.subplot(312)
h = plt.hist(df, alpha=0.5)
plt.subplot(313)
h = plt.hist(df, stacked=True, alpha=0.5)
plt.show()df = pd.DataFrame({
"x1": np.random.normal(size=100),
"x2": np.random.normal(1,2, size=100),
"x3": np.random.normal(-1,3, size=100)
}).melt()
plt.figure(figsize=(4, 4), layout="constrained")
plt.boxplot("value", positions="variable", data=df)Error: ValueError: List of boxplot statistics and `positions` values must have same the length
Error: ValueError: List of boxplot statistics and `positions` values must have same the length
To the best of your ability recreate the following plot,
Both Series and DataFrame objects have a plot method which can be used to create visualizations - dtypes determine the type of plot produced. Note these are just pyplot plots and can be formated as such.
Plot types can be changed via the kind argument or using one of the DataFrame.plot.<kind> method,
The pandas library also provides the plotting submodule with several useful higher level plots,
cov = np.identity(5)
cov[1,2] = cov[2,1] = 0.5
cov[3,0] = cov[0,3] = -0.8
df = pd.DataFrame(
np.random.multivariate_normal(mean=[0]*5, cov=cov, size=1000),
columns = ["x1","x2","x3","x4","x5"]
)
df x1 x2 x3 x4 x5
0 -0.675512 -0.072846 0.536075 -0.480851 0.828583
1 0.867784 -0.099565 0.014922 -1.403662 0.465785
2 0.028221 -1.572683 -2.679542 -1.030949 -0.655153
3 0.434528 -0.570881 0.446828 -0.424219 -1.336715
4 0.320779 0.294548 0.834556 -0.261610 -0.648069
.. ... ... ... ... ...
995 -0.642675 1.500878 0.244987 0.472639 0.444908
996 1.481997 0.902982 1.271029 -1.003460 -0.817025
997 0.001276 1.001031 -0.196185 -0.430334 -0.767048
998 -2.008858 0.978614 -0.347012 1.500994 0.669926
999 1.702875 0.235235 1.581850 -0.721819 0.333595
[1000 rows x 5 columns]
['PlotAccessor', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '_core', '_matplotlib', '_misc', 'andrews_curves', 'autocorrelation_plot', 'bootstrap_plot', 'boxplot', 'boxplot_frame', 'boxplot_frame_groupby', 'deregister_matplotlib_converters', 'hist_frame', 'hist_series', 'lag_plot', 'parallel_coordinates', 'plot_params', 'radviz', 'register_matplotlib_converters', 'scatter_matrix', 'table']
Sta 663 - Spring 2023